[GLUTEN-11402][VL] Fix decimal partition key serialization to preserve scale#11618
Conversation
|
Run Gluten Clickhouse CI on x86 |
8d961b0 to
d0172fa
Compare
|
Run Gluten Clickhouse CI on x86 |
d0172fa to
f97d6e1
Compare
|
Run Gluten Clickhouse CI on x86 |
|
@baibaichen @zhouyuan can you please review? Thanks! |
|
@zhouyuan ping on a review for this, thanks 😊 |
|
@acvictor Thanks for the fix. The code looks good. However in the log, it seems there are still some fallback on scan reported, is this expected? |
f97d6e1 to
a15475e
Compare
|
Run Gluten Clickhouse CI on x86 |
@zhouyuan this is expected. Baseline This PR The Exchange/Project fallbacks with CheckOverflowInTableInsert are pre-existing on the INSERT path and the baseline also has this. This PR has more instances because I extended the test to go from 1 INSERT to 6 INSERTs to cover additional decimal scenarios. The logs do show an improvement from the baseline, because Scan parquet spark_catalog.default.dynparttest2 was previously falling back with "Unsupported decimal partition column in native scan." but in this PR, that scan fallback is eliminated. |
| DateFormatter.apply().format(pv.asInstanceOf[Integer]) | ||
| case _: DecimalType => | ||
| pv.asInstanceOf[Decimal].toJavaBigInteger.toString | ||
| pv.asInstanceOf[Decimal].toJavaBigDecimal.unscaledValue().toString |
There was a problem hiding this comment.
Could you explain why decimal partition keys are not supported?
There was a problem hiding this comment.
It's not that it's unsupported but rather results in incorrect casting (see #11618). The change is needed because Decimal.toJavaBigInteger truncates the fractional part, producing an incorrect unscaled value. For example, a Decimal("100.1") with scale=1 would serialize as "100" (the truncated BigInteger) instead of "1001" (the correct unscaled representation). This causes Velox reader to misinterpret decimal partition values, returning wrong query results.
a15475e to
55820f9
Compare
|
Run Gluten Clickhouse CI on x86 |
|
@zhouyuan does this change look good to you? |
What changes are proposed in this pull request?
This PR fixes decimal partition value serialization by replacing
toJavaBigInteger.toStringwithtoJavaBigDecimal.unscaledValue().toString, removes fallback guard that was added by #11518 and adds additional test cases to SQLQuerySuite covering small decimals, zero-scale decimals, negative values, and multi-partition pruning.How was this patch tested?
Existing UTs added in #11518 + extended
Incorrect decimal casting for partition readtestWas this patch authored or co-authored using generative AI tooling?
No
Related issue: #11402